Replication File for:

Lu, Yingdan and Jennifer Pan. 2020. "Capturing Clicks: How the Chinese Government Uses Clickbait to Compete for Visibility" Political Communication. Forthcoming.


** Citation **

Yingdan Lu and Jennifer Pan. 2020. "Capturing Clicks: How the Chinese Government Uses Clickbait to Compete for Visibility.” Political Communication, forthcoming.

@article{LuPan2020,
author = {Lu, Yingdan and Pan, Jennifer},
journal = {Political Communication},
title = {{Capturing Clicks: How the Chinese Government Uses Clickbait to Compete for Visibility}},
volume = {(forthcoming)},
year = {2020}
}


** Notes **

Please set the working directory to the directory where this ReadMe.txt is located.


** Code files **
 (in the "code" folder)

1) desc.R 
-- Descriptive statistics of data collection (government posts, government accounts, and non-government posts)
-- Output: Figure 2, Appendix Figure A1, Appendix Table A2

2) topic_modeling.R
-- Structural Topic Model (STM)
-- Output: Figure 3, Appendix Figure A2, Appendix Table A3

3) clickbait.R
-- Compare clickbait usage between government and non-government accounts; compare clickbait vs emotional appeals
-- Output: Figure 4, Figure 5, Figure 6

4) visibility.R
-- Analyze relationship between clickbait and views, clickbait and likes, as well as clickbait and account performance
-- Output: Table 1/Appendix Table A4, Table 2/Appendix Table A5, Table 3, Appendix Figure A3


** Datasets used in code**
 (in the "data" folder)

1) total_posts.csv
-- 197,303 titles of posts made by 213 city-government WeChat Official Account from May 25, 2018 to May 25, 2019. Scraped from Sogou Weixin (https://weixin.sogou.com/).
-- variables:
(1) account_name: name of the city government fabu (发布) account
(2) title: title of the post
(3) time_pek: time when the post was released
(4) date_pek: date when the post was released

2) sample_posts.csv
-- 58,711 titles of posts randomly sampled from total_posts.csv, stratified by city-government WeChat Official Account.
-- variables:
(1) account_name: name of the city government fabu (发布) account
(2) title: title of the WeChat post
(3) time_pek: time when the post was released
(4) date_pek: date when the post was released
(5) life_guidance: whether the post contains practical guidance to people's livelihood that is unrelated to politics, 1 = yes, 0 = no
(6) listicles: whether the title contains at least one listicle, 1 = yes, 0 = no
(7) gennn: whether the title contains at least one general noun, 1 = yes, 0 = no
(8) gennn_word: the general noun that the title contains
(9) hyperbolic: whether the title contains at least one hyperbolic word, 1 = yes, 0 = no
(10) hyperbolic_word: the hyperbolic word(s) that the title contains, multiple hyperbolic words separated by comma (,) delimiter
(11) slang: whether the title contains at least one slang, 1 = yes, 0 = no
(12) slang_word: the slang that the title contains, multiple slang separated by comma (,) delimiter
(13) joy: whether the title contains the emotion of joy as an emotional appeal, 1 = yes, 0 = no
(14) pride: whether the title contains the emotion of pride as an emotional appeal, 1 = yes, 0 = no
(15) anger: whether the title contains the emotion of anger as an emotional appeal, 1 = yes, 0 = no
(16) fear: whether the title contains the emotion of fear as an emotional appeal, 1 = yes, 0 = no
(17) vision: whether the title contains the vision appeal, 1 = yes, 0 = no
(18) warmth: whether the title contains the emotion of warmth as an emotional appeal, 1 = yes, 0 = no
(19) excl_mark: the number of exclamation marks used in the title
(20) question_mark: the number of question marks used in the title
(21) ellipsis_mark: the number of ellipsis marks used in the title
(22) total_mark: the number of marks used in the title
(23) phrases_num: the number of fixed phrase used in the title
(24) phrases: the fixed phrase(s) used in the title, multiple phrases separated by comma (,) delimiter
(25) pronoun_num: the number of pronouns used in the title
(26) pronoun: the pronoun(s) used in the title, multiple pronouns separated by comma (,) delimiter
(27) reads: the number of "Reads" (阅读) the post received as of September 10, 2019
(28) likes: the number of "Wows" (在看) the post received as of September 10, 2019.
Note: between December 2018 and March 2019, WeChat replaced the previous like feature “Praise” (赞) with the new like feature “Wow” (在看) through two software updates. When the “Wow” feature was implemented, previous “Praise” data was no longer displayed on WeChat. As a result of this change in WeChat’s technical features, our data on likes consists of “Wows” on titles posted between March 14, 2019 and May 25, 2019.
(29) job: whether the post contains information about job openings, 1 = yes, 0 = no
(30) gov: whether the post contains information about government administrative activities, 1 = yes, 0 = no
(31) leader: whether the post contains information about leader activities, 1 = yes, 0 = no
(32) fame: whether the post contains information about local claims to fame, 1 = yes, 0 = no
(33) ideo: whether the post contains propaganda of central ideology, 1 = yes, 0 = no

3) nongov_posts.csv
-- 1,607 titles of posts collected from three non-government WeChat Official Account (Dingxiang Doctor, Zhanhao, and Lifeweek). 
-- Variables:
(1) account_name: name of the account
(2) title: title of the WeChat post
(3) date_pek: date when the post was released (mm/dd/yyyy)
(4) reads: the number of "Reads" the post received as of December 30, 2019
(5) likes: the number of "Wows" the post received as of December 30, 2019. Due to the same change in WeChat’s technical features, our data on likes consists of “Wows” on titles posted between March 14, 2019 and May 25, 2019.
(6) excl_mark: the number of exclamation marks used in the title
(7) question_mark: the number of question marks used in the title
(8) ellipsis_mark: the number of ellipsis marks used in the title
(9) total_mark: the number of all marks used in the title
(10) phrases_num: the number of fixed phrase used in the title
(11) phrases: the fixed phrase(s) used in the title, multiple phrases separated by comma (,) delimiter
(12) hyperbolic: whether the title contains at least one hyperbolic word, 1 = yes, 0 = no
(13) hyperbolic_word: the hyperbolic word(s) that the title contains, multiple hyperbolic words separated by comma (,) delimiter
(14) slang: whether the title contains at least one slang, 1 = yes, 0 = no
(15) slang_word: the slang that the title contains, multiple slang separated by comma (,) delimiter
(16) pronoun_num: the number of pronouns used in the title
(17) pronoun: the pronoun(s) used in the title, multiple pronouns separated by comma (,) delimiter
(18) listicles: whether the title contains at least one listicle, 1 = yes, 0 = no
(19) gennn: whether the title contains at least one general noun, 1 = yes, 0 = no
(20) gennn_word: the general noun that the title contains
(21) joy: whether the title contains the emotion of joy as an emotional appeal, 1 = yes, 0 = no
(22) pride: whether the title contains the emotion of pride as an emotional appeal, 1 = yes, 0 = no
(23) anger: whether the title contains the emotion of anger as an emotional appeal, 1 = yes, 0 = no
(24) fear: whether the title contains the emotion of fear as an emotional appeal, 1 = yes, 0 = no
(25) vision: whether the title contains the vision appeal, 1 = yes, 0 = no
(26) warmth: whether the title contains the emotion of warmth as an emotional appeal, 1 = yes, 0 = no

4) city_data.csv
-- City-level indicators gathered from the 2018 China City Statistical Yearbook published by the National Bureau of Statistics of China (http://www.stats.gov.cn/tjsj/ndsj/2018/indexeh.htm) and WeChat Communication Index score of 213 city-government WeChat Official Account collected from Qingbo Big Data Corporation (http://www.gsdata.cn/site/usage)
-- variables:
(1) province: province that the city is a part of
(2) city: city that the account is affiliated with
(3) account_name: name of the city-government fabu (发布) account
(4) Population: population of the city in 2018 (unit: 10000 people)
(5) GDP: Gross Domestic Product (unit: 100 million yuan)
(6) GDPPerCap: GDP per capita (unit: yuan)
(7) PBExp: government public budgetary expenditure (unit: 100 million yuan)
(8) GrossArea: gross area (unit: square km)
(9) Internet/DSL: Internet subscribers (unit: 10000 households)
(10) Phone: mobile subscribers (unit: 10000 households)
(11) Tourism: inbound tourists (unit: 10000 people)
(12) registration: date when the account started posting (mm/dd/yyyy)
(13) WCI: WeChat Communication Index score
(14) Account_affiliation: department/institution the account is affiliated with, 1 = administrative office of the local government; 2 = information office of the local government; 3 = local party committee propaganda department; 4 = local cyberspace administration office; 5 = local official media outlets or media companies; 6 = private company; 7 = other government department / office

5) stopword.txt
-- Stop word list used in STM; combines several mainstream stop-word lists, which can be downloaded at https://github.com/goto456/stopword,) and then revised based on WeChat text features; includes 1075 stop words.


** Other data used in paper**
 (in the "data" folder)

1) fixed-phrase.txt
-- Dictionary with 54 fixed-pattern words derived from n-gram analysis and human validation.

2) hyperbolic.txt
-- Dictionary with 479 hyperbolic words derived from human open-coding.

3）general_nn.txt
-- Dictionary with 39 general nouns derived from human open-coding.

4) slang.txt
-- Dictionary with 1,857 slang words derived from human open-coding and the Sogou Cell Thesaurus: https://pinyin.sogou.com/dict/.

5) pronoun.txt
-- Dictionary with 67 pronouns derived from parts-of-speech tagging (\#PN tags).

